================================================================================
README — Replication Package
================================================================================

Paper:
  "The dissemination–conversion tradeoff in social campaigns for
   stigmatized behaviors"

Authors:
  Alexander Moldavski (1), Artur Obminski (2,3), Alexandra Avdeenko (4),
  Yagan Hazard (5), Andreas Ette (6), Andreas Meyer-Lindenberg (1),
  Nicolas Rusch (7), C. Katharina Spiess (6), Luc Behaghel (2)

  (1) Central Institute of Mental Health, Mannheim, Germany
  (2) Paris School of Economics, Paris, France
  (3) Paris 1 Pantheon-Sorbonne University, Paris, France
  (4) World Bank, Washington, DC, USA
  (5) Collegio Carlo Alberto, Turin, Italy
  (6) Federal Institute for Population Research (BiB), Wiesbaden, Germany
  (7) University of Ulm, Ulm, Germany

Corresponding author:
  Luc Behaghel — luc.behaghel@psemail.eu

Journal: Science

AEA RCT Registry: AEARCTR-13354

================================================================================
OVERVIEW
================================================================================

This package contains the code required to reproduce all tables and
figures in the paper and supplementary materials. Note that only part of the 
data for the replication can be shared (see Data Sources and Availability below). 

The study combines two experimental designs:

  (A) Depsy bot field experiment. Ukrainian refugees in Germany were recruited
      via a purpose-built Telegram bot ("Depsy") that provided information about
      mental-health resources. Bot users were randomized to one of five arms
      (Z = 0, ..., 4) varying the type of video content promoting help-seeking:
      control (text only), peer × non-ally, peer × ally, celebrity × non-ally,
      and celebrity × ally. The primary outcome is calls to a partnered mental-
      health hotline. N ≈ 1,993 bot users + 92 hotline callers.

  (B) BiB/FReDA-UKR within-survey experiment. A random subsample of the
      population-representative panel of Ukrainian refugees in Germany was
      randomized during the intervention wave to watch a peer or celebrity
      testimonial, or to receive no video (control). Outcomes include attitudes
      toward mental-health help-seeking and reported help-seeking behavior,
      measured immediately (N = 2,819) and at a three-month follow-up
      (N = 2,237).

The paper presents a structural dissemination–conversion model, reduced-form
regressions, and a simulation of the dissemination–conversion tradeoff.

In addition, the study analyzes anonymized conversations.


================================================================================
SOFTWARE REQUIREMENTS
================================================================================

Statistical software:
  Stata 15 or later. Do-files declare "version 15.0" for reproducibility.
  
Required user-written Stata packages:
  The following packages must be installed. do/0.batch.do attempts to install
  them automatically via SSC if internet access is available. They may also be
  installed manually before running the batch file.

    ssc install unique
    ssc install rangejoin
    ssc install rangestat
    ssc install swindex
    ssc install fre
    ssc install labutil

No other software is required. All analyses, figures, and tables are produced
within Stata.

================================================================================
DATA SOURCES AND AVAILABILITY
================================================================================

Three data sources are used:

1. Depsy bot monitoring data (provided in this package)
   Source: Dashboard exports from the Depsy Telegram bot platform.
   Format: CSV (raw dashboard exports) and DTA (pre-processed intermediates).
   Privacy: Telegram user IDs are hashed; no personally identifiable
   information is included.
   Key files:
     visits.csv          — one row per user action on the bot
     users.csv           — one row per registered user
     links.csv           — link click events
     block_discoveries.csv — block events (privacy/block actions)
     contact_requests.csv — contact and contact-request events
     links_with_visit_stats.csv — link metadata with visit counts
	contact_nophones.dta — anonymized version of the initial contact.csv file, that 
	included phone numbers
		

2. REDCap survey data (provided in this package)
   Source: REDCap (Research Electronic Data Capture) platform. Three
   instruments were administered:
   a. Bot baseline survey — administered to Depsy bot users before treatment
        data/baseline survey/BaselineSurveyOfTele_DATA_2025-07-17_1113.csv
   b. Hotline research survey — administered by hotline counselors to callers
        data/hotline/Socialmediaexperimen_DATA_NOHDRS_2025-07-17_1004.dta
	(aonymized version of Socialmediaexperimen_DATA_NOHDRS_2025-07-17_1004.csv)
   c. Within-survey hotline callers — administered during the within-survey
      hotline follow-up experiment
        data/within survey hotline/HotlineDataCollectio_DATA_NOHDRS_2025-04-07_1023.csv
        data/within survey hotline/HotlineDataCollectio_STATA_2025-04-07_1023.do
          (auto-generated Stata labeling do-file produced by REDCap)
   Note on filenames: REDCap export filenames contain the download timestamp.
   If the data are re-downloaded from REDCap, filenames in the do-files must
   be updated to match (see Notes section below).

3. BiB/FReDA-UKR panel data (NOT PROVIDED IN THIS PACKAGE)
   Source: Federal Institute for Population Research (BiB), Germany.
   Population-representative panel of Ukrainian refugees in Germany.
   Files:
     intervention_suf.dta     — intervention wave survey responses
     w4_suf.dta               — wave 4 (three-month follow-up) responses
     wave1_suf.dta            — wave 1 baseline responses
     wave1_education.dta      — wave 1 education supplement
     language.dta             — language variable from the panel
     intervention_timestamps.dta — timestamps of intervention survey responses
   
	Access restrictions: Access to the BiB/FReDA-UKR panel data is offered
	BiB Research Data Centre,  which is responsible for the official data dissemination 
	and access procedures (https://www.bibb.de/en/53.php)


4. Telegram chat and bot data (NOT PROVIDED IN THIS PACKAGE)
   Source: Telegram group chat export (JSON) and Manja bot prompt log (TXT).
   These files underlie the descriptive analysis of community information-seeking
   behavior presented in the supplementary materials.
   Files:
     data/telegram/telegram_chat_mannheim.json
                             — full export of the Mannheim Ukraine support group chat
     data/telegram/Manja1125.txt
                             — log of 12,636 successful user prompts to the Manja
                               community bot (Mannheim/Ludwigshafen, 2024–2025)
   
   Privacy: The JSON chat export contains visible Telegram display names. These
   are used only in aggregate (author-level counts by topic) and are not linked
   to any individual-level experimental or survey data.	

================================================================================
FILE STRUCTURE
================================================================================

Replication package MHU/
|
+-- README.txt                        This file
+-- codebook.xlsx 		      Codebook 
|
+-- paper/                            Paper 
|
+-- questionnaires/                   Survey instruments
|
+-- do/                               Stata do-files (see descriptions below)
|   +-- 0. batch.do                   Master script
|   +-- 1. data 1 - bot.do
|   +-- 1. data 2 - hotline research survey.do
|   +-- 1. data 3 - bot baseline survey.do
|   +-- 1. data 4 - hotline within survey experiment.do
|   +-- 1. data 5 - merge bot - hotline - baseline.do
|   +-- 1. data 6 - within survey.do
|   +-- 2. analysis - dissemination model and reduced form.do
|   +-- 2. analysis - within survey.do
|   +-- 3. simulation -- dissemination conversion tradeoff.do
|
+-- data/                             Input data (see Data Sources above)
|   +-- baseline survey/
|   +-- hotline/
|   +-- monitoring/
|   +-- within survey hotline/
|
+-- graph/                            Output directory for figures (created by code)
+-- log/                              Output directory for logs and table exports
+-- tables/
|   +-- tables Science paper.xlsx     Final compiled tables (manually assembled)
+--  r/                               R scripts for supplementary analyses
|   +-- Telegram_chat_analysis_submission.R
|   +-- Telegram_Manja_bot_analysis.R



================================================================================
DO-FILE DESCRIPTIONS
================================================================================

do/0. batch.do
  Master script. Sets global macros ($path, $data, $do), installs required SSC
  packages, and calls all data-preparation and analysis do-files in sequence.
  NOTE: The $path macro is currently hardcoded must be updated before running (see
  Replication Instructions below).
  
do/1. data 1 - bot.do
  Imports and merges Depsy bot monitoring CSVs (visits.csv, users.csv,
  links.csv, block_discoveries.csv). Constructs user-level and action-level
  datasets. Uses hashed Telegram IDs (true_user_id) as the common identifier.
  Outputs:
    $path/data/depsy user activity - user level.dta
    $path/data/depsy user activity - detailed action level.dta

do/1. data 2 - hotline research survey.do
  Imports the hotline research REDCap CSV. Cleans and labels PHQ-4, PCL-5,
  demographic, and help-seeking variables. Creates mental-health condition
  indicators (MHcondition, anxiety, depression).
  Output: hotline research dataset (intermediate DTA, path in $path/data/).

do/1. data 3 - bot baseline survey.do
  Imports the bot baseline REDCap CSV. Cleans PHQ-4, PTSD, and demographic
  variables. Drops 11 identified duplicate records. Creates MHcondition,
  anxiety, and depression indicators.
  Outputs:
    $path/data/baseline_covariates.dta
    $path/data/baseline_cleaned.dta

do/1. data 4 - hotline within survey experiment.do
  Imports the within-survey hotline REDCap CSV. Creates synthetic user IDs for
  within-survey callers and assigns experimental arm (randomization_id 21 or 22)
  by phone number.
  Output:
    $path/data/within survey hotline callers.dta

do/1. data 5 - merge bot - hotline - baseline.do
  Merges all bot-experiment data sources: hotline call records, bot monitoring
  data, and bot baseline survey. Defines treatment arms (Z = 0 control/text,
  Z = 1 peer x non-ally, Z = 2 peer x ally, Z = 3 celebrity x non-ally,
  Z = 4 celebrity x ally). Defines analytical samples (SAMPLE4: 1,993 bot
  users; SAMPLE3: hotline callers with MH status and origin information).
  Outputs:
    $path/data/depsy user activity - user level - with calls.dta
    $path/data/depsy user activity - user level - with calls and baseline.dta

do/1. data 6 - within survey.do
  Prepares the BiB/FReDA-UKR panel for analysis. Merges intervention wave,
  wave 4 follow-up, wave 1 baseline, and education data. Constructs baseline
  covariates (age, sex, education, region of origin, language, family
  structure). Creates merged long-format dataset (individual x wave).
  Outputs:
    $path/data/wave1_complete.dta
    $path/data/wave1_baseline_var.dta
    $path/data/w4_suf2.dta
    $path/data/merged_within0.dta
    $path/data/analysis_data_within_survey.dta

do/2. analysis - dissemination model and reduced form.do
  Main analysis for the Depsy bot field experiment. Implements:
    - Reduced-form regressions (Table S4)
    - Structural dissemination-conversion model estimation via method of moments
    - Randomization inference with 2,000 re-randomization draws (seed 20251223)
      for all structural parameters and their pairwise differences across arms
    - Two over-identifying restriction tests (beta1 constancy across arms;
      alternative identification of gamma via survey sharing rates)
  Outputs (to $path/log/):
    Table 1.xlsx     — Table 1 (structural model, pooled and by arm)
    Table S4.xlsx    — Table S4 (arm-level structural estimates)
    Table S6.xlsx    — Table S6 (reduced-form regressions)
    detailed results randomization inference - *.dta — intermediate RI results

do/2. analysis - within survey.do
  Analysis for the BiB/FReDA-UKR within-survey experiment. Produces:
    - Descriptive statistics comparing bot and within-survey samples
    - Propensity score for bot use (logit), used for reweighting
    - Reweighted OLS regressions of treatment (peer/celebrity video) on
      attitudes (barrier index, stigma toward help-seeking, stigma toward MH,
      intention to reach out, help-seeking index) for both the immediate
      (intervention wave) and three-month (wave 4) samples
  Outputs (to $path/log/):
    Table S1.xlsx           	— Table S1 (descriptive statistics)
    Table S2.xlsx   		— Table S2 (control group descriptives)
    Table_S6.xlsx             	— Table S5 (impact of video treatments)

do/3. simulation -- dissemination conversion tradeoff.do
  Produces a simulation of the dissemination–conversion tradeoff across arm
  types, using the estimated structural parameters. Generates Figure 2.
  Output:
    simul_dissemination.pdf — Figure 2


================================================================================
R-SCRIPT DESCRIPTIONS
================================================================================

r/Telegram_chat_analysis_submission.R
  Reads the full Telegram group chat JSON export. Classifies messages into 18
  topic categories (mental health, other medical, housing, authorities, etc.)
  using multilingual keyword dictionaries (Russian/Ukrainian/German/English).
  Produces topic frequency tables and bar charts, and disaggregates posts by
  admin vs. non-admin authors.
  Input:  data/telegram/telegram_chat_mannheim.json
  Outputs (to out_simple/):
    topic_totals_en.csv
    01_topics_counts_en.png
    02_topics_share_of_all_posts_en.png
    mh_authors_by_from.csv
    mh_posts_by_from_full.csv
    admin_share_by_topic_en.csv
    03_admin_share_by_topic_en.png
  Note: Update FILE_JSON path at the top of the script before running.

r/Telegram_Manja_bot_analysis.R
  Reads the Manja bot prompt log (TXT). Strips bot-address prefixes ("Маня,")
  and classifies each prompt into the same 18 topic categories as the chat
  analysis script. Produces prompt-level frequency tables and bar charts.
  Input:  data/telegram/Manja1125.txt
  Outputs (to out_bot/):
    bot_topic_totals_en.csv
    01_bot_topics_counts_en.png
    02_bot_topics_share_en.png
  Note: Update BOT_TXT_FILE path at the top of the script before running.
  Required R packages: tidyverse, stringr, glue, scales, readr
    (installed automatically on first run if internet access is available).

================================================================================
REPLICATION INSTRUCTIONS
================================================================================

STEP 1. Update the master path.
  Open do/0.batch.do in a text editor. Locate the line that sets the global
  $path macro (currently set as "global path="[SET MAIN DIRECTORY HERE]") and
  replace it with the absolute path to the replication package on your machine:

    Windows:  global path "C:\path\to\Replication package MHU"
    Mac/Unix: global path "/path/to/Replication package MHU"

  The globals $data (= "$path/data") and $do (= "$path/do") are derived
  automatically and do not need to be changed.

STEP 2. Install required Stata packages (if not already installed).
  If internet access is available, do/0.batch.do will install the packages
  automatically. To install manually:

    ssc install unique
    ssc install rangejoin
    ssc install rangestat
    ssc install swindex
    ssc install fre
    ssc install labutil

STEP 3. Run the main batch file.
  In Stata, run:

    do "path/to/Replication package MHU/do/0. batch.do"

  This executes all data-preparation do-files (1. data 1 through 1. data 6)
  and both analysis do-files (2. analysis - dissemination model and reduced
  form; 2. analysis - within survey) in sequence. A log file is written for
  each do-file. Outputs are saved to log/ and graph/.

  Expected runtime: The randomization inference loop (2,000 replications in
  do/2. analysis - dissemination model and reduced form.do) is computationally
  intensive. Runtime depends on hardware; expect approximately 15 minutes
  on a standard desktop computer. For testing, reduce the number of replications
  by changing "global rep 2000" to a smaller value (e.g., "global rep 100")
  at the top of that do-file.


STEP 4. Consult the compiled output.
  tables/tables Science paper.xlsx contains the final formatted tables as they
  appear in the paper. These were assembled manually from the Excel files
  written to log/ by the do-files. The correspondence is as follows:

    Table 1        <- log/Table 1;xlsx (columns: Text, Peer, Celebrity,
                                              Ally, Non-Ally)
    Table S1       <- log/Table S1.xlsx
    Table S3       <- log/Table S3.xlsx
    Table S4       <- log/Table S4.xlsx
    Table S5       <- log/Table S5.xlsx
    Table S6       <- log/Table S6.xlsx

  Figures not produced by Stata (Figure 2, Figures S1–S5) were prepared

================================================================================
NOTES
================================================================================

1. Stata version compatibility.
   All do-files open with "version 15.0", which locks syntax to Stata 15 for
   reproducibility. The code runs without modification on Stata 15 through 18.
   Results were verified on Stata 18 MP for macOS.

2. Randomization inference seed.
   The randomization inference in do/2. analysis - dissemination model and
   reduced form.do uses seed 20251223. Results in the paper correspond
   precisely to this seed. Changing the seed will alter p-values slightly
   (within sampling variability of 2,000 draws).

3. REDCap export filenames.
   REDCap embeds the download timestamp in every export filename. The following
   filenames are hardcoded in the do-files and must be updated if the data
   are re-downloaded from REDCap:

   do/1. data 2 - hotline research survey.do:
     "Socialmediaexperimen_DATA_NOHDRS_2025-07-17_1004.csv"

   do/1. data 3 - bot baseline survey.do:
     "BaselineSurveyOfTele_DATA_2025-07-17_1113.csv"

   do/1. data 4 - hotline within survey experiment.do:
     "HotlineDataCollectio_DATA_NOHDRS_2025-04-07_1023.csv"



================================================================================
END OF README
================================================================================
